A clustering approach using a combination of gravitational search algorithm and k-harmonic means and its application in text document clustering
نویسنده
چکیده
Data clustering is one of the most popular techniques of information management, which is used in many applications of science and engineering such as machine learning, pattern reorganization, image processing, data mining, and web mining. Different algorithms have been suggested by researchers, where the evolutionary algorithms are the best in data clustering and especially in big datasets. It is illustrated that GSA-KM, which is a combination of the gravitational search algorithm (GSA) and K-means (KM), is superior over some other comparative evolutionary methods. One of the drawbacks of this approach is dependency on the initial seeds. In this paper, a combination method of GSA and Kharmonic means, called GSA-KHM, has been proposed, in which the dependency on the initialization has been improved. The proposed GSA-KHM method has been applied to data clustering. As a special application, it has also been used on the text document clustering application. The simulation results show that the proposed method works better than the GSA-KM and other comparative methods in both data clustering and text document clustering applications.
منابع مشابه
A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS
Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...
متن کاملTabu-KM: A Hybrid Clustering Algorithm Based on Tabu Search Approach
The clustering problem under the criterion of minimum sum of squares is a non-convex and non-linear program, which possesses many locally optimal values, resulting that its solution often falls into these trap and therefore cannot converge to global optima solution. In this paper, an efficient hybrid optimization algorithm is developed for solving this problem, called Tabu-KM. It gathers the ...
متن کاملPersistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm
Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...
متن کاملA Clustering Based Location-allocation Problem Considering Transportation Costs and Statistical Properties (RESEARCH NOTE)
Cluster analysis is a useful technique in multivariate statistical analysis. Different types of hierarchical cluster analysis and K-means have been used for data analysis in previous studies. However, the K-means algorithm can be improved using some metaheuristics algorithms. In this study, we propose simulated annealing based algorithm for K-means in the clustering analysis which we refer it a...
متن کاملFuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition
In this paper, utilization of clustering algorithms for data fusion in decision level is proposed. The results of automatic isolated word recognition, which are derived from speech spectrograph and Linear Predictive Coding (LPC) analysis, are combined with each other by using fuzzy clustering algorithms, especially fuzzy k-means and fuzzy vector quantization. Experimental results show that the...
متن کامل